Generalized Network Growth: from Microscopic Strategies to the Real Internet 
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In this paper we present a generalized model for network growth that links the microscopical agent 
strategies with the large scale behavior. This model is intended to reproduce the largest number of 
features of the Internet network at the Autonomous System (AS) level. Our model of network grows 
by adding both new vertices and new edges between old vertices. In the latter case a "rewarding 
attachment" takes place mimicking the disassortative mixing between small routers to larger ones. 
We find a good agreement between experimental data and the model for the degree distribution, 
the betweenness distribution, the clustering coefficient and the correlation functions for the degrees. 

PACS numbers: 05.40+j, 64.60Ak, 64.60Fr, 87.10+e 



Networks or graphs are mathematical entities com- 
posed by sites (or vertices) connected by links (or 
edges) U • Due to the apparent simplicity of such defini- 
tion, many attempt have been made in order to describe 
very different physical situations within this framework. 
More interestingly new unexpected properties (with re- 
spect to the traditional approach of Random Graphs due 
to P. Erdos and A. Renyi have been found in a vari- 
ety of different sj^stems. Internet[ll3, WWW[i,|EI3, so- 
cial structures 8, and even protein interactions^^ 
display a self-similar distribution P{k) oc for the de- 
gree k (i.e. the number of links per vertex). From a the- 
oretical point of view some of the active ingredients that 
determine such self-similarity in the statistic al prope rties 
of the degree have already been found [l3 . [l3 . Il4 Il5l | . In- 
terestingly, graph properties are far more complex than 
the degree distribution. Therefore the onset of a complete 
set of topological properties in the above real networks 
remains to be fully explained. The most interesting case 
is represented by Internet, where a full understanding of 
the statistical properties of the phenomenon could help 
in improving the technical features. 

In this Letter we want to present a simple statistical 
model able to reproduce many statistical properties of 
the Internet beyond the degree distribution (most of the 
present models focus mainly on that). This is achieved, 
as explained below, by relating the microscopic agents 
strategies and the macroscopic statistical properties. 

A way to describe the complex phenomenology (even 
by restricting to the topology, ignoring the different 
weight of various links given by the traffic) can be made 
by considering the clustering correlation and centrality 
present in the graph. 

Clustering measures the presence of part of the graph 
denser than the average. The most immediate measure 
of clustering is the clustering coefficient Ci for every site 
i. This quantity gives the probability that two nearest 
neighbors of a vertex are also neighbors to each other. 
The clustering coefficient can then be averaged over the 
vertices in the structure, giving the total clustering < 



C > or rather be decomposed by considering the function 
c{k) giving the probability that a vertex has clustering 
coefficient c(fc) given its degree k. 

Correlation is best represented by the conditional prob- 
ability Pc{k'\k) that a link belonging to a node with 
degree k points to a node with degree k' . If this is in- 
dependent on fc, we have Pc{k'\k) = Pc{k') ~ k'P{k'). 
If instead there is a dependence on k we can establish 
the strength of correlation between vertices of different 
degree. The most immediate way to compute such a cor- 
relation is given by considering the quantity 



< knn >=Y,k'P{k'\k) 



(1) 



i.e. the nearest neighbors average degree of nodes with 
degree k. 

Centrality of some vertices with respect to other zones 
is also a way to consider deviations from average behavior 
in the structure. In particular betweenness and closeness, 
are the measures of the centrality of a site with respect 
to the other vertices in the graph The betweenness 
6 of a vertex i gives the probability that the site i is in 
the shortest path from vertex j and vertex k (for every 
i and k). If the number of shortest paths between a pair 
of vertices (j, fc) is D{j,k), we denote with Di{j,k) the 
number of such shortest paths running through i. The 
fraction gi{j, k) = Di{j, k)/D{j, k) may be interpreted as 
the amount of the role played by the vertex i in social 
relation between two persons j and k. The betweenness 
of i is defined as the sum of gi{j^ k) over all the connected 
pairs. 
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All the above quantities have already been considered 
and analyzed in a series of papers [2ll l22l | . The main re- 
sult of these analysis is the detection of a complex inner 
structure resulting in a clustering larger than expected, 
a power law distribution of the c(fc) (i.e. c(fc) oc k~'^) 
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a power law distribution of the correlation fc„„(fc) (i.e. 
knn{k) oc k~'^) and finally a power law distribution of 
the values of the betweenness h (i.e. P{b) oc 6^'' ). On 
the basis of these analysis the Internet at the level of 
Autonomous Systems (AS) shows a self-similar behavior 
in all these quantities signaling a possible presence of a 
critical state. Interestingly, all the statistical models in- 
troduced do not reproduce such features. The Barabasi 
Albert model introduced the concept of preferential at- 
tachment such that the network grows by addition of 
new nodes that link with the older ones with a probabil- 
ity proportional to the degree of the latter ones. Even if 
this simple rule reproduces nicely the degree distribution 
it fails in reproducing the correlations, the centrality and 
the clustering of the AS systems. Some modifications 
of the above model give a better qualitative agreement 
with the real data. In particular in the stochastic growth 
model proposed in [l7| a constant fraction of site are 
added at every timestep and a substantial rewiring takes 
place. Also the fitness model has a fairly nice agreement 
with the shape of the AS distributions. In this mo del 
the preferential attachment is weighted through an indi- 
vidual site fitness. 

The above models, anyway, do not give a precise quan- 
titative prediction of all the properties measured in teh 
AS network. Here we want to present a new statisti- 
cal model giving a better agreement with the data and 
linking the microscopic dynamic to the macroscopic evo- 
lution. The basic idea of the model is to allow both the 
addition of a vertex (with probability p) and the addi- 
tion of a link (with probability 1 — p). Typically such 
link relates two sites 1, 2 whose degrees are fci and ^2. It 
is natural to define a "directionality" in the link. This 
is defined by deciding who pays the cost of the connec- 
tions. This should mimic for example users that pay to 
get wired to Internet, flow of information etc. In this 
paper we will not consider directionality explicitly, leav- 
ing the extension to the oriented graphs to future work. 
In general, we can write the probability of addition of 
this link as -P(fci, k2)- The specific form can be directly 
linked to the microscopical agents strategies. There are 
two obvious limiting cases. The case fci = corresponds 
to a new site which decides to join the network. The 
case A:2 = corresponds to the creation of a link to- 
ward a site not connected to the network. The process is 
asymmetric due to the growth rules, that allows to write 
P(fci,fc2) in terms of conditional probabilities. If site 1 
pays for connection, then P{ki,k2) — A(fc2|fci)^'i(fci)- 
A simple ansatz corresponding to the BA model, would 
be to assume that Pi{ki) = Sk^fi and P(fc2lfci) oc fc2- 
The generalization presented in Ref . [l^ |20| assumes in- 
stead ^2(^21^1) = ^2 -I- A (^2 is the in-degree) and 
-Pi(fci) = ki + fi (fci is the out-degree). 

Here we propose to consider non oriented graphs (as it 
is the case of AS). Furthermore we assume Pi(fci) cx ki 
and we tune the form of the P2(fci|fc2) in order to ob- 
tain the different situations observed in the experimental 



data[21|, |22| . For example a form of the type 
^2(^2 |fci) oc 



fci 
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(3) 



would produce the so-called assortative mixing[2J| where 
vertices of the same degree tend to be connected to each 
other. 

We instead focus on the opposite limit where 

P2(fc2|fci)«|fci-fc2| (4) 

in order to model the so-called disassortative mixing'23| 
with large hubs and poorly connected vertices. In this 
limit at any time step 

1. Either a vertex is added and linked with vertex i 
with probability 



P 



J2i = lM ^3 



(5) 



2. or an edge is added (if absent) between vertices i 
and j already present, with probability 



(6) 



From the above rules, the case p = 1 (no edge creation) 
corresponds to a traditional AB model where only one 
edge is added for time step. Intuitively as the parameter 
p is tuned to the edge growth becomes more and more 
important. This results in a larger connected core with 
respect to the AB model, as shown in Fig.l. 

Numerical simulations of this model are presented be- 
low for different cases. The quantities we decide to mon- 
itor are the distribution of the degree P(fc), the distri- 
bution of both betweenness b and closeness c, the clus- 
tering coefficient c(fc) of vertices whose degree is k and 
finally the average degree , knn{k) of neighbours of a ver- 
tex whose degree is k. All these quantities together with 
experimental data for the AS system are reported in Ta- 
ble 1. 

As reported in Fig. 2, the probability distribution of 
the degree, P(fc), follows a power law behavior for every 
value of the parameter p. 

In particular the exponent 7 diminishes as loops start 
to form in the system when p < 1. Using such numerical 
evidence about the shape of the P(fc) we can present 
an analytical estimate of the exponent 7 through simple 
arguments. We firstly notice that the number of edges 
in the system increases by one unity at every time step. 
Consequently the total degree over the network increases 
by two 



kNk(t) = 2t. 



(7) 



fe=i 



Here Ni.{t) is the number of vertices with degree k at 
time t. 
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The total number of vertices, instead, increases at a 
rate p. Therefore the total number of vertices is 

Y^Nk{t)=pt (8) 

k=l 

We assume that there is a stationary state, so that the 
number of vertices grows linearly in time, N^it) = rikt. 
As stated above, we also assume that the degree distribu- 
tion is a power law, Uk = ak~'^ , as seen from simulations 
in agreement with the preferential attachment rule. Then 
we can write 

Uk — a J dk ~ = p (9) 

and 

Y^^knk~a j k"^+^dk = ^^-^ = 2 (10) 

Using EasPandlTUIwe obtain 7(p) = 2+ ^r^; which pro- 
vides good estimates for the results of the simulations, 
and correctly recovers the limiting cases, 7(1) = 3. A 
striking feature of the model is that as soon as p is differ- 
ent form one still deals with a scale free network. The 
limit value for the distribution is 7(0+) = 2. The case 
p = is degenerate giving rise to a complete graph whose 
degree distribution is a delta function peaked around the 
size n of the system. One can argue that this limit is 
peculiar. Indeed a complete graph of N nodes is charac- 
terized by a number of edges of order N'^ . 

In our model, for arbitrary small but strictly positive p, 
both the number of nodes and the number of edges grow 
linearly in time, with a fixed ratio, so that the graph will 
never be complete. As regards the distance distribution, 
we find the small world effect, that is 

a peak around a characteristic value psj. 

It has recently been shown (2^ that the probability dis- 
tribution P{h) for the betweenness h follows a power law, 

PB{g)^g-\ (11) 



where 77 = 2 is about 2. For our model we find, in agree- 
ment with Rcf. 27], that the exponent 77 is equal to 2.0 
if p = 1. From the data for p 7^ 1.0 we can conclude that 
the exponent changes to 77 = 2.2, as it happens for the 
BA model when m > 1 and loops start to appear in the 
network (see inset of Fig. 2). Another important measure 
of centrality is given by the closeness c. Closeness of a 
site i is simply the inverse of the sum of the distances 
from i toward all the other vertices. Not surprisingly 
since the distance distribution has a small-world effect 
this quantity has a frequency distribution decreasing ex- 
ponentially. 

It is interesting to study the structure of such quantity 
with respect to the degree distribution. In particular 
we checked the behavior of c(fc) defined as the average 
clustering coefficient for a site whose degree is k . Also 
this quantity could be fitted with a power law c(fc) ~ fc""^ 
as shown in Fig. 3. The model for p = 1.0 is a BA tree 
and therefore by definition (since no loops are present) 
the clustering coeflicient is always zero. Instead in the 
BA model where m is larger than 1, loops are present 
and the distribution of c{k) with respect to k is flat. A 
very similar behavior can be found for the < knn{k) > 
(inset of FiglSJ . Again we have a power law of the form < 
knn{k) >~ fc~^ for large probability of rewiring (p << 1) 
while this structuring disappears whenp = l |2lll22ll23| ). 

In conclusion we presented a model whose topological 
features depend on the parameter p tuning the ratio of 
vertices to edges creation. Interestingly, we find that for 
p = 0.5(1) the model nicely reproduces most of the prop- 
erties measured in the real case. It would be then very 
tempting to assume that substantial rewiring in existing 
routers is the key ingredient that makes the statistical 
properties of Internet networks so different from other 
growing networks. 
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FIG. 1: Plot of graphs obtained for different values of p. 
Above a graph with p — 1 corresponding to AB tree; be- 
low the rewiring produced by a p = 0.5 simulation, gives rise 
to a more connected structure. Pictures made with Pajek 
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FIG. 2: Plot of the degree distribution for various values of 
p. In the inset the integrated betweenness distribution. For 
the latter one only the symbols for p = 0.1, p = 0.7, p — 0.9, 
p = 1.0 have been explicitely plotted. 




FIG. 3: Plot of the average clustering coefBcient for vertices 
whose degree is fc. In the inset the average degree of the 
nearest neighbours of a vertex whose degree is k. 



TABLE I: Data from Numerical simulation of the model. The 
last row refers to the Internet (AS) network. 



p 


< k> 


7 


2 -1 E_ 


< d > 







T 


0.1 


19.9(5) 


2.15(5) 


2.05 


2.8(3) 


2.1(1) 


0.8(1) 


0.5(1) 


0.2 


10.0(3) 


2.2(1) 


2.11 


2.9(3) 


2.1(1) 


0.8(2) 


0.5(1) 


0.3 


6.6(3) 


2.3(2) 


2.18 


3.0(2) 


2.1(2) 


0.7(2) 


0.5(1) 


0.4 


5.0(2) 


2.3(3) 


2.25 


3.1(2) 


2.2(1) 


0.7(2) 


0.5(2) 


0.5 


4.0(2) 


2.5(2) 


2.33 


3.4(2) 


2.2(1) 


0.6(3) 


0.5(2) 


0.6 


3.3(2) 


2.5(2) 


2.43 


3.9(3) 


2.1(1) 


0.5(4) 


0.5(3) 


0.7 


2.8(1) 


2.6(1) 


2.54 


4.4(3) 


2.3(2) 






0.8 


2.5(1) 


2.7(1) 


2.67 


5.5(1) 


2.1(1) 






0.9 


2.2(1) 


2.9(1) 


2.82 


6.5(4) 


2.2(2) 






1.0 


2.0(1) 


3.0(1) 


3.00 


8.7(3) 


2.0(1) 






AS 


3.8(1) 


2.22(1) 




4.16(1) 


2.2(1) 


0.75 


0.55 



